Multilingwis – Explore Your Parallel Corpus

نویسندگان

  • Johannes Graën
  • Dominique Sandoz
  • Martin Volk
چکیده

We present Multilingwis2, a web based search engine for exploration of word-aligned parallel and multiparallel corpora. Our application extends the search facilities by Clematide et al. (2016) and is designed to be easily employable on any parallel corpus comprising universal part-of-speech tags, lemmas and word alignments. In addition to corpus exploration, it has proven useful for the assessment of word alignment quality. Loading the results of different alignment methods on the same corpus as different corpora into Multilingwis2 alleviates their comparison. Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-137129 Originally published at: Graën, Johannes; Sandoz, Dominique; Volk, Martin (2017). Multilingwis2 – Explore Your Parallel Corpus. In: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, Gothenburg, Sweden, 22 May 2017 24 May 2017, 247-250.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing k-means clusters on parallel Persian-English corpus

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

Parallel computing using MPI and OpenMP on self-configured platform, UMZHPC.

Parallel computing is a topic of interest for a broad scientific community since it facilitates many time-consuming algorithms in different application domains.In this paper, we introduce a novel platform for parallel computing by using MPI and OpenMP programming languages based on set of networked PCs. UMZHPC is a free Linux-based parallel computing infrastructure that has been developed to cr...

متن کامل

Building Parallel Corpora for SMT System: A Case Study of English-Manipuri

The Statistical Machine Translation (SMT) systems are developed using sentence aligned parallel corpus. The difficulty is that there is no parallel corpus at the required measure for many language pairs. The preparation of large scale parallel corpus takes time and demands the linguistics skill. In the present work, the various issues of a quality parallel corpus and a technique that extracts p...

متن کامل

Can Crowds Build parallel corpora for Machine Translation Systems?

Corpus based approaches to machine translation (MT) rely on the availability of parallel corpora. In this paper we explore the effectiveness of Mechanical Turk for creating parallel corpora. We explore the task of sentence translation, both into and out of a language. We also perform preliminary experiments for the task of phrase translation, where ambiguous phrases are provided to the turker f...

متن کامل

The AMARA Corpus: Building Resources for Translating the Web’s Educational Content

In this paper, we introduce a new parallel corpus of subtitles of educational videos: the AMARA corpus for online educational content. We crawl a multilingual collection community generated subtitles, and present the results of processing the Arabic–English portion of the data, which yields a parallel corpus of about 2.6M Arabic and 3.9M English words. We explore different approaches to align t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017